125 research outputs found

    Multimodal Observation and Interpretation of Subjects Engaged in Problem Solving

    Get PDF
    In this paper we present the first results of a pilot experiment in the capture and interpretation of multimodal signals of human experts engaged in solving challenging chess problems. Our goal is to investigate the extent to which observations of eye-gaze, posture, emotion and other physiological signals can be used to model the cognitive state of subjects, and to explore the integration of multiple sensor modalities to improve the reliability of detection of human displays of awareness and emotion. We observed chess players engaged in problems of increasing difficulty while recording their behavior. Such recordings can be used to estimate a participant's awareness of the current situation and to predict ability to respond effectively to challenging situations. Results show that a multimodal approach is more accurate than a unimodal one. By combining body posture, visual attention and emotion, the multimodal approach can reach up to 93% of accuracy when determining player's chess expertise while unimodal approach reaches 86%. Finally this experiment validates the use of our equipment as a general and reproducible tool for the study of participants engaged in screen-based interaction and/or problem solving

    Deep learning investigation for chess player attention prediction using eye-tracking and game data

    Get PDF
    This article reports on an investigation of the use of convolutional neural networks to predict the visual attention of chess players. The visual attention model described in this article has been created to generate saliency maps that capture hierarchical and spatial features of chessboard, in order to predict the probability fixation for individual pixels Using a skip-layer architecture of an autoencoder, with a unified decoder, we are able to use multiscale features to predict saliency of part of the board at different scales, showing multiple relations between pieces. We have used scan path and fixation data from players engaged in solving chess problems, to compute 6600 saliency maps associated to the corresponding chess piece configurations. This corpus is completed with synthetically generated data from actual games gathered from an online chess platform. Experiments realized using both scan-paths from chess players and the CAT2000 saliency dataset of natural images, highlights several results. Deep features, pretrained on natural images, were found to be helpful in training visual attention prediction for chess. The proposed neural network architecture is able to generate meaningful saliency maps on unseen chess configurations with good scores on standard metrics. This work provides a baseline for future work on visual attention prediction in similar contexts

    SocialInteractionGAN: Multi-person Interaction Sequence Generation

    Get PDF
    Prediction of human actions in social interactions has important applications in the design of social robots or artificial avatars. In this paper, we model human interaction generation as a discrete multi-sequence generation problem and present SocialInteractionGAN, a novel adversarial architecture for conditional interaction generation. Our model builds on a recurrent encoder-decoder generator network and a dual-stream discriminator. This architecture allows the discriminator to jointly assess the realism of interactions and that of individual action sequences. Within each stream a recurrent network operating on short subsequences endows the output signal with local assessments, better guiding the forthcoming generation. Crucially, contextual information on interacting participants is shared among agents and reinjected in both the generation and the discriminator evaluation processes. We show that the proposed SocialInteractionGAN succeeds in producing high realism action sequences of interacting people, comparing favorably to a diversity of recurrent and convolutional discriminator baselines. Evaluations are conducted using modified Inception Score and Fr{\'e}chet Inception Distance metrics, that we specifically design for discrete sequential generated data. The distribution of generated sequences is shown to approach closely that of real data. In particular our model properly learns the dynamics of interaction sequences, while exploiting the full range of actions

    A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony in Talking Head Generation

    Full text link
    Animating still face images with deep generative models using a speech input signal is an active research topic and has seen important recent progress. However, much of the effort has been put into lip syncing and rendering quality while the generation of natural head motion, let alone the audio-visual correlation between head motion and speech, has often been neglected. In this work, we propose a multi-scale audio-visual synchrony loss and a multi-scale autoregressive GAN to better handle short and long-term correlation between speech and the dynamics of the head and lips. In particular, we train a stack of syncer models on multimodal input pyramids and use these models as guidance in a multi-scale generator network to produce audio-aligned motion unfolding over diverse time scales. Our generator operates in the facial landmark domain, which is a standard low-dimensional head representation. The experiments show significant improvements over the state of the art in head motion dynamics quality and in multi-scale audio-visual synchrony both in the landmark domain and in the image domain

    Perceptive Services Composition using semantic language and distributed knowledge

    Get PDF
    International audienceBuilding applications composing perceptive services in a pervasive environment can lead to an inextricable problem: they were built by several people, using different programming languages and multiple conventions and protocols. Moreover, services can be volatile, so appear or disappear during running time of the application. This paper proposes the use of a dedicated human-readable semantic language to describe perceptive services. After converting this description into a more common language, one can recruit services using inference engines to build complex applications. In order to increase robustness of the whole system, descriptions of services are distributed over the network using a crosslanguage crossplateform open-source middleware of our own called OMiSCID

    Multi-Sensors Engagement Detection with a Robot Companion in a Home Environment

    Get PDF
    Workshop FW1 "Assistance and Service Robotics in a Human Environment" - Session3: Behavioral modeling and Human/Robot InteractionInternational audienceRecognition of intentions is an unconscious cognitive process vital to human communication. This skill enables anticipation and increases interactive exchanges quality between humans. Within the context of engagement, i.e. intention for interaction, non-verbal signals are used to communicate this intention to the partner. In this paper, we investigated methods to detect these signals in order to allow a robot to know when it is about to be addressed. Classically, the human position and speed, the human-robot distance are used to detect the engagement. Our hypothesis is that this method is not enough in a context of a home environment. The chosen approach integrates multimodal features gathered using a robot enhanced with a Kinect. The evaluation of this new method of detection on our corpus collected in spontaneous conditions highlights its robustness and validates use of such technique in real environment. Experimental validation shows that the use of multimodal sensors gives better precision and recall than the detector using only spatial and speed features. We also demonstrate that 7 multimodal features are sufficient to provide a good engagement detection score

    Autoregressive GAN for Semantic Unconditional Head Motion Generation

    Get PDF
    We address the task of unconditional head motion generation to animate still human faces in a low-dimensional semantic space.Deviating from talking head generation conditioned on audio that seldom puts emphasis on realistic head motions, we devise a GAN-based architecture that allows obtaining rich head motion sequences while avoiding known caveats associated with GANs.Namely, the autoregressive generation of incremental outputs ensures smooth trajectories, while a multi-scale discriminator on input pairs drives generation toward better handling of high and low frequency signals and less mode collapse.We demonstrate experimentally the relevance of the proposed architecture and compare with models that showed state-of-the-art performances on similar tasks

    Smartphone-based User Location Tracking in Indoor Environment

    Get PDF
    International audienceThis paper introduces our work in the framework of Track 3 of the IPIN 2016 Indoor Localization Competition, which addresses the smartphone-based tracking problem in an offline manner.Our approach splits the path-reconstruction into several smaller tasks, including building identification, floor identification, user direction and speed inference.For each task, a specific set of data from the provided log data is used.Evaluation is carried out using a cross validation scheme.To produce the robustness again noisy data, we combine several approaches into one on the basis of their testing results.By testing on the provided training data, we have a good accuracy on building and floor identification. For the task of tracking the user's position within the floor, the result is 10m at 3rd-quarter distance error after 3 minutes of walking

    Autonomous Robot Controller Using Bitwise GIBBS Sampling

    Get PDF
    International audienceIn the present paper we describe a bio-inspired non von Neumann controller for a simple sensorimotor robotic system. This controller uses a bitwise version of the Gibbs sampling algorithm to select commands so the robot can adapt its course of action and avoid perceived obstacles in the environment. The VHDL specification of the circuit implementation of this controller is based on stochastic computation to perform Bayesian inference at a low energy cost. We show that the proposed unconventional architecture allows to successfully carry out the obstacle avoidance task and to address scalability issues observed in previous works
    • …
    corecore